Skip to content

Conversation

lihaoyi
Copy link
Contributor

@lihaoyi lihaoyi commented Sep 1, 2025

This PR demonstrates using the https://github.com/com-lihaoyi/PPrint library to handle pretty-printing of values in the REPL.

Visible improvements:

  • Data structures like sequences and case classes are now nicely formatted and indented, including deeply nested data structures, to make best use of the vertical and horizontal space available
  • Strings are consistently quoted in collections and case classes, rather than sometimes quoted and sometimes not.
  • The rendering of Seq("") and Seq() is no longer identical (sometimes!)
  • Unusual characters within strings are now properly quoted, rather than being butchered during the rendering
  • (not shown) Character literals like 'X' are properly pretty-printed with quotes
  • Adjustments to the syntax highlighting colour scheme, making it subjectively easier to read and converging it with the scheme used by pprint's own internal highlighter:
    • Unified StringColor and LiteralColor as green rather than red. This should help avoid the red of literals being visually confused with the red of error messages when pretty printing code during compilation errors, which is something I have had problems with in the past
    • Highlighted capitalized identifies like Foo or Seq or List, since the vast majority of these identifiers are likely to be the companion object of types, and highlighting them helps greatly in visually finding your way around pretty-printed data structures

Before:

Screenshot 2025-09-01 at 9 42 52 AM

After:

Screenshot 2025-09-01 at 1 41 05 PM

Notes:

  • This PR only uses PPrint for formatting and not coloring, relying on the existing REPL code that deals with syntax highlighting (with tweaks). Using PPrint's highlighter directly would require a larger refactor that can come in a follow up iff we decide to do so
  • We build pprint/fansi/sourcecode from source using sourceGenerators. This requires a bit of patching to work around -Xexplicit-nulls and -Xfatal-warnings, but otherwise is straightforward and means for all intents and purposes it's just part of the Dotty codebase. We mangle the package paths to make them dotty.shaded.* packages to avoid conflict with user code
  • The verbosity of PPrint can be configured, e.g. we can decide whether we want to print field names or not. By default it prints field names for any case class with more than 1 field
  • We can't use os-lib or requests while still supporting Java 8, as they require Java 11, so for now I just use java.io/java.nio to do the same thing. It looks super ugly, but when we start requiring Java >=17 we can clean this up

I turned off -Xfatal-warnings so I can build this on my laptop with Java 21, but we can revert that before merging. This PR would definitely need some cleanup before merging, but in principle it seems to work

TODO/Future-Work:

  • Automatically select max height/width based on terminal size, and provide a helper (similar to Ammonite's show(...)) to bypass the max height. For now, it's fixed at the default width of 100 columns
  • We can use the same approach to make use of os-lib and other libraries within scala3-compiler by building them from source
  • Make use of fansi elsewhere in the dotty codebase. e.g. the highlighting of stack traces via the code syntax highlighter is super ugly and could be cleaned up:
Screenshot 2025-09-01 at 1 09 22 PM

@lihaoyi
Copy link
Contributor Author

lihaoyi commented Sep 1, 2025

CC @odersky @hamzaremmal as we discussed this when I visited lausanne

@Gedochao
Copy link
Contributor

Gedochao commented Sep 1, 2025

We can't use os-lib or requests while still supporting Java 8, as they require Java 11, so for now I just use java.io/java.nio to do the same thing. It looks super ugly, but when we start requiring Java >=17 we can clean this up

@lihaoyi we already require Java >= 17, the main branch is already set to 3.8

val downloads = Seq(
"https://repo1.maven.org/maven2/com/lihaoyi/pprint_3/0.9.3/pprint_3-0.9.3-sources.jar",
"https://repo1.maven.org/maven2/com/lihaoyi/fansi_3/0.5.1/fansi_3-0.5.1-sources.jar",
"https://repo1.maven.org/maven2/com/lihaoyi/sourcecode_3/0.4.3-M5/sourcecode_3-0.4.3-M5-sources.jar",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...wait, is 0.4.3-M5 a stable version? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to bring it to stable before we depend on it in the compiler repo 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's almost a year old, so I guess so haha. I can tag a stable version if you would like, but the contents of the sourcejar will be unchanged

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's become stabilised, by all means. 👍
I'd rather avoid milestone versions here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @Gedochao. A stable version should be used here. Also, @lihaoyi what are the versioning scheme these 3 libraries follow? I'm not a fan of cloning the sources and change the package name. I prefer to just have a dependency and use the actual library (which we do for jline and will soon do for asm too).

Copy link
Contributor Author

@lihaoyi lihaoyi Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, others have raised concerns in the past about using Scala libraries in the compiler codebase affecting the bootstrapping process. By building from source, we treat it effectively as Dotty's own source files, removing any divergence in the code paths: they are treated identically to scala3's own sources. If scala3 can compile itself, it should be able to compile these sources without issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why has it been downloaded every time?
  2. Seems no checkmd5?
  3. Extract the common version to fields?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should indeed compile these from sources. We can depend on binaries for Java libraries (hence jline and asm are fine), but not for Scala libraries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should indeed compile these from sources. We can depend on binaries for Java libraries (hence jline and asm are fine), but not for Scala libraries.

Could you explain why? I thought Scala is maintaining backwards binary/tasty compatibility. Doesn't that mean we shohld always be able to depend on older scala 3 jars in the scala3 compiler regardless of how kuch bootstrapping we do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because circular dependencies are evil. They're not too evil across time (Av2 -> Bv1 -> Av1), but they're still difficult to reason about.

And even though Scala 3 will forever be backward compat, an eventual Scala 4 wouldn't. We shouldn't paint our build into a corner. Scala 2 tried this several times over its lifetime, and rolled back every time. It's a massive pain every time it happens. There would need to be a huge upside to depending on a binary for that to be offset.

@lihaoyi
Copy link
Contributor Author

lihaoyi commented Sep 1, 2025

@Gedochao the community_build_n jobs were failing when I was using Java 11 APIs such as the java.net.HttpClient (transitively via requests-scala)

@hamzaremmal
Copy link
Member

@lihaoyi I'm bumping the version required by the CB to Java 17.

@Gedochao
Copy link
Contributor

Gedochao commented Sep 1, 2025

Note: I like the visible results very much, looking forward to how this PR evolves.

@tgodzik tgodzik added the stat:needs decision Some aspects of this issue need a decision from the maintainance team. label Sep 1, 2025
var in: InputStream = null
var zis: ZipInputStream = null
try {
in = new BufferedInputStream(conn.getInputStream)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the using resource API here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If @hamzaremmal is bumping the required version to Java 17, I'm hoping to delete al of this and it'll collapse down to a single line

@WojciechMazur
Copy link
Contributor

I think that using the existing sources with patches is fine for POC, but before merging that I'd rather prefer embedding the sources into this project if possible (including all required acknowledgments and license/author notices).
All of these libraries are rather small, overall it should be less the 3K LOC)

The current code from Build can be transformed into script to download and patch (shade) sources.
In my opinion sources of a core library should be fully transparent - no one should go through sources to 3rd party sources to understand how the REPL printing works.

Also by storing sources in this repo it would be easier to compare changes in case of upcoming upgrades. Same approach is done in Scala Native for dealing with core dependencies like libunwind (see it's download&patch script )

It would also make it easier to introduce bug fixes or adjustments if needed without forcing a new release of pprint of its dependencies.

@lihaoyi
Copy link
Contributor Author

lihaoyi commented Sep 1, 2025

@WojciechMazur the tradeoff between vendoring the source v.s. downloading it on the fly and patching it is as follows:

  1. Vendoring the source makes it:

    • Easier to look at the source at any point in time - it's right there.
    • Harder to upgrade: because every time you want to upgrade, you need to first diff the current version v.s. the current upstream and then re-apply it to the newer version
  2. Downloading and patching makes it:

    • Harder to look at the source at any point in time: you need to look inside compiler/target/src_managed/
    • Easier to upgrade: usually you can just bump the downloaded version and the String#replace calls will just work

There's no strictly better answer, but I think the tradeoffs of downloading and patching are more useful: it's not that hard to look inside compiler/target/src_managed if you want to see the sources, and if you want to see the evolution of pprint itself you can just look at the pprint codebase. And not having the reverse-engineer a diff and manually re-apply it every time you want to upgrade seems very useful.

@soronpo
Copy link
Contributor

soronpo commented Sep 1, 2025

Given you are aiming for source dependency instead of binary, consider the possible copyright sideffects.

@He-Pin
Copy link
Contributor

He-Pin commented Sep 1, 2025

I think the current is fine, or we can use git submodule?

@@ -5,7 +5,7 @@ scala> NInt(23)
val res0: NInt = NInt@17

scala> res0.toString
val res1: String = NInt@17
val res1: String = "rs$line$1$NInt@17"
Copy link
Contributor Author

@lihaoyi lihaoyi Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what's happening here before this PR, but there must be some super sketchy stdout-regexing happening to mangle the .toString so it looks different when returned or println-ed.

scala> res0.toString
val res1: String = NInt@17
                                                                                                                          
scala> println(res0.toString)
rs$line$1$NInt@17

The new behavior is probably better: we special case returning because it uses pprint, println is just println, and if someone wants pprint themselves they can use dotty.shaded.pprint.log


scala>:settings -Vrepl-max-print-characters:10

scala> 1.to(10).mkString
val res1: String = 123456789 ... large output truncated, print value to show all
val res1: String = "12345678910"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old -Vrepl-max-print-elements and -Vrepl-max-print-characters:10 don't work with PPrint. Instead, we can control the max width and height before truncation. As a first pass I'd say we can do that in a follow up, but if people really want I can add those -Vrepl-width and -Vrepl-height in this PR

@lihaoyi
Copy link
Contributor Author

lihaoyi commented Sep 1, 2025

The current failure doesn't seem to have any error message that I can find. Can anyone help me take a look and see what's wrong? https://github.com/scala/scala3/actions/runs/17377385477/job/49326734244?pr=23849

@som-snytt
Copy link
Contributor

Since the project REPL is now primarily for project use, I'd suggest keeping the REPL output simple, and add a :show res0 command instead for special printing.

Then the currently absent :javap/:asmp could be implemented as :show Foo.class. Maybe :print is a better name.

There are already issues with reproduction for tickets involving the usual REPL snippet wrapping and importing from history, let alone scala-cli conventions (which affect option processing, besides the directives per se), so that it would be too bad to also cope with rendering issues.

@som-snytt
Copy link
Contributor

The test says it's testing

+ ./bin/scala -classpath /tmp/tmp.AwAD4O5uus -M HelloWorld

is that due to

+ /__w/scala3/scala3/project/scripts/../../project/scripts/sbt dist-linux-x86_64/Universal/stage

at 8750

@lihaoyi
Copy link
Contributor Author

lihaoyi commented Sep 1, 2025

If someone wants the raw toString of a value, it's only a single println away.

Furthermore, as you can see the default REPL is nowhere near simple: it does exactly the same thing that PPrint does! It recursively traverses Seqs, truncates Lists, quotes strings, sanitizes ansi codes, manually prints Arrays, etc. The underlying approach is basically identical: a runtime recursive traversal of Any for common data types to make them more usefully readable when printed out than the default toString would be

PPrint just does the same thing better, with its 10 year old implementation being significantly prettier than ScalaRunTimes 20 year old implementation. For common data structures the output should be more obvious than the default one, for example you can now visually differentiate:

  • null and "null"
  • List("") and Nil
  • "\u001b[31m" and "31m"

And for non-common data types it falls back to the ScalaRunTime renderer so the output should be mostly the same as before

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:needs decision Some aspects of this issue need a decision from the maintainance team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants